XML: URL Data Set Creation for Future Web Mining Research Avenues

نویسندگان

  • Krishna Murthy
  • A Suresha
چکیده

The rapid expansion of the internet has made web a popular place for disseminating and collecting information and also it opens many research topics on varies research fields. Since last few years, several attempts have been made on Web based research particularly based on HTML web pages because of its more availability. So that many Research Data sets have created and few of them are made available on Web. But W3 consortium stated that, HTML does not provide a better description of semantic structure of the web page contents. To overcome this draw back Web developers started to develop Web page(s) on XML, Flash kind of new technologies [1]. It makes a way for new Research methods. This article mainly focuses on Data Set creation on XML Web pages by using Sequential search, Link Extraction and string based classification methods for future research avenues on XML Web pages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule Learning from Semi-structured Documents by Inductive Logic Programming

One of the hot research areas is knowledge discovery on structured documents like HTML and XML documents. In the case of XML documents, most popular approach to mining a knowledge is structural approach which find some kind of similar pattern(often tree structure or XPath) in interested XML documents. On the other hand, there is relational data mining approach such as ILP(Inductive Logic Progra...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Processing biological literature with customizable Web services supporting interoperable formats

Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent d...

متن کامل

XML structural delta mining: Issues and challenges

Recently, there is an increasing research efforts in XML data mining. These research efforts largely assumed that XML documents are static. However, in reality, the documents are rarely static. In this paper, we propose a novel research problem called XML structural delta mining. The objective of XML structural delta mining is to discover knowledge by analyzing structural evolution pattern (als...

متن کامل

Mining Historical Xml

Nowadays the Web poses itself as the largest data repository ever available in the history of humankind (Reis et al., 2004). However, the availability of huge amount of Web data does not imply that users can get whatever they want more easily. On the contrary, the massive amount of data on the Web has overwhelmed their abilities to find the desired information. It has been claimed that 99% of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014